Familia: A Configurable Topic Modeling Framework for Industrial Text Engineering
نویسندگان
چکیده
In this paper, we propose a configurable topic modeling framework named Familia. Familia supports an important line of models that are widely applicable in text engineering scenarios. order to relieve burdens software engineers without knowledge Bayesian networks, is able conduct automatic parameter inference for variety models. Simply through changing the data organization Familia, easily explore broad spectrum existing or even design their own models, and find one best suits problem at hand. With its superior extendability, has novel sampling mechanism strikes balance between effectiveness efficiency inference. Furthermore, essentially big parallel distributed storage. The utilities necessity demonstrated real-life industrial applications. would significantly enlarge engineers’ arsenal pave way utilizing highly customized problems. Source code have been released Github via https://github.com/baidu/Familia/.
منابع مشابه
Familia: An Open-Source Toolkit for Industrial Topic Modeling
Familia is an open-source toolkit for pragmatic topic modeling in industry. Familia abstracts the utilities of topic modeling in industry as two paradigms: semantic representation and semantic matching. Efficient implementations of the two paradigms are made publicly available for the first time. Furthermore, we provide off-the-shelf topic models trained on large-scale industrial corpora, inclu...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملTopic Cube: Topic Modeling for OLAP on Multidimensional Text Databases
As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand,...
متن کاملan investigation about the appropriate stochastic modeling framework for agricultural insurance pricing
با توجه به اینکه بیمه محصولات کشاورزی در ایران بیشتر جنبه ای حمایتی دارد و خسارات گزارش شده عموما بیش از حق بیمه های دریافت شده است، در این پایان نامه به جهت تعیین قیمت بیمه محصولات کشاورزی (گندم دیم) از فرآیندهای نوفه شلیک به عنوان مدلی مناسب استفاده شده است. بر اساس داده های صندوق بیمه کشاورزی از خسارات اعلام شده در سال زراعی 1388-1389 گندم دیم، در این پایان نامه حق بیمه خالص و ناخالص این محص...
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-73200-4_36